Software of the Month Club 1996 June

home *** CD-ROM | disk | FTP | other *** search

/ Software of the Month Club 1996 June / Software of the Month Club 1996 June.iso / pc / dos / dtp / aurora / regexp.dox < prev next >

Wrap

Text File | 1995-02-24 | 12.2 KB | 297 lines

Regular Expression Searching ──────────────────────────── The Regular Expression search option 'x' allows you to specify complex search patterns when searching through buffers or strings. Option 'x' can be specified in both end-user search prompts and macro language searching functions (such as 'find' and 'replace'). Regular expression search patterns are created by combining normal characters with regular expression 'operator' characters in the search string. These operators take on a special meaning when the search option 'x' is specified. Each operator matches a pattern. There are operators which allow you to anchor searches to the beginning or end of a line, match any character, match a class of characters or its complement, optionally match a pattern, match one of several patterns, match repeating patterns, and match groups of patterns. A rich set of regular expression operators are provided. The following table lists and describes each of the operators: Operator Description ──────── ─────────── ^ Matches the beginning of a line. If the search is confined to a marked block with search option 'b', then this operator matches the beginning column of the mark. For example: ^ // matches the beginning of a line ^a // matches 'a' at the beginning of a line ^apples // matches 'apples' at the beginning of a line $ Matches the end of a line. If the search is confined to a marked block with search option 'b', then this operator matches the ending column of the mark or line. For example: $ // matches the end of a line o$ // matches 'o' at the end of a line oranges$ // matches 'oranges' at the end of a line . Matches any character. For example: . // matches any single character .. // matches any two consecutive characters t.o // matches 'two' or 'too', but not // 'toe' or 'true' [ ] Specifies a 'class' of characters that a single character can match. For example: [ab] // matches 'a' or 'b' [abc12!] // matches 'a', 'b', 'c', '1', '2', or '!' [AaZz] // matches 'A', 'a', 'Z', or 'z' Note that the character class is always case-sensitive, even when the 'ignore case' search option 'i' is specified. [ - ] Specifies a range of characters to match when used between characters in a class. Note that '-' is treated as a normal character if used as the first or last character of the class, or if used outside the class. For example: [a-z] // matches characters 'a' through 'z' [-+0-9] // matches characters '0' through '0' and // '-' and '+' [a-zA-Z0-9] // matches any alphanumeric character [~ ] Specifies the complement of a character class against which to match a character. The '~' operator is only meaningful when used as the first character after the '[' bracket, otherwise it is treated as any other normal character. For example: [~ab] // match any characters other than 'a' or 'b' [~12~] // match any characters other than // '1', '2', or '~' [~0-9] // match any non-numeric character ? Optionally matches the preceding pattern. For example: thes?e // matches 'thee' or 'these' the[sm]?e // matches 'thee', 'these', or 'theme' | This is the alternation ('or') operator. It matches the preceding or the following pattern. For example: the|in // matches 'then' or 'thin' // (but not 'the' or 'in) thes|me // matches 'these' or 'theme' Multiple '|' operators can be chained together. The 'or-ed' patterns are searched in the order in which they are listed. For example: thes|m|r| |e // matches 'these', 'theme', 'there', or 'the e' {apples}|{oranges}|{bananas} // matches 'apples', 'oranges', or 'bananas' (see below // for a description the grouping operator '{}') * Matches zero or more occurrences of the preceding pattern, matching as few occurrences as possible (minimum closure). For example: fo*bar // matches 'fbar', 'fobar', 'foobar', 'fooobar', etc. apples.*oranges // matches any string starting with 'apples' and ending // with 'oranges': 'Minimum closure' means that the shortest possible string is matched. For example, if the search pattern is 'ab*b' and string to be searched is 'abbbbbbb', then 'ab' will be matched. Thus, the '*'and '+' operators are seldom used at the end of a search string). + Matches one or more occurrences of the preceding pattern, matching as few occurrences as possible (minimum closure). For example: fo+bar // matches 'fobar', 'foobar', 'fooobar', etc. apples +oranges // matches any string starting with 'apples', followed // by one or more spaces, and ending with 'oranges': @ Matches zero or more occurrences of the preceding pattern, matching as many occurrences as possible (maximum closure). For example: a.@z // matches a string starting with 'a' and ending with // 'z', for the longest possible string '.@' // matches a single-quoted string for the longest // possible string 'Maximum closure' means that the longest possible string is matched. For example, if the search pattern is "ab@b", and the string to be searched is 'abbbbbbb', then 'abbbbbbb' will be matched. # Matches one or more occurrences of the preceding pattern, matching as many occurrences as possible (maximum closure). For example: [a-zA-Z]# // matches the first occurrence of one or more // alphabetic characters, for the longest string // possible string2# // matches 'string2', 'string22', 'string222', etc. // (matching the longest possible string) { } Groups characters or other patterns together as one pattern, so that regular expression operators can act on the entire pattern. For example: {apples}|{oranges} // matches 'apples' or 'oranges' another{ fine}? mess // matches 'another mess' or 'another fine mess' {ab}# // matches 'ab', 'abab', 'ababab', etc. {{ab}|{xy}}# // matches 'ab', 'xy', 'abab', 'abxy', 'xyab', 'abxyab', // etc. The '{}' operator also identifies or 'tags' patterns for replacement (see below). \ Indicates that the next character is to taken literally and not used as a regular expression operator. For example: apples\++oranges // matches 'apples+oranges', 'apples++oranges', etc. whats all this then\? // matches "whats all this then?" c:\\file\.?txt // matches 'c:\filetxt' or 'c:\file.txt' The '\' operator can also be used to match specific characters: \a matches the alert (beep) character (ASCII 7) \b matches the backspace character (ASCII 8) \f matches the formfeed character (ASCII 12) \n matches the newline (linefeed) character (ASCII 10) \r matches the return character (ASCII 13) \t matches the tab character (ASCII 9) \v matches the vertical tab character (ASCII 11) \xHH matches the hexadecimal character 'HH' For example: \t\t // matches two tab characters \x00|\r // matches a binary zero or a return character // (ASCII 13) The '\' operator is also used within a replacement pattern to reference a pattern which was tagged with the grouping '{}' operator (see below). The following are a few additional examples of regular expression search patterns: ^$ // matches blank lines ^.*$ // matches all the characters on any line ^.+$ // matches all the characters on any non-blank line {if}|{else}|{for}|{while}|{switch}|{return}|{break} // matches a few 'C' language keywords [a-zA-Z0-9_]# // matches identifiers in most languages ^ *{function}|{key}.*$ // matches AML function headers [a-zA-Z0-9_]# *= *[0-9]# // matches statements of the form: variable = number Regular Expression Replacement Patterns ─────────────────────────────────────── A pattern which was 'tagged' by the grouping operator '{}' in the search string of a regular expression search-and-replace operation can be referenced in the replacement string by using the '\' replacement operator. Tagged patterns are numbered from 1 to 9 based on the leftmost '{' symbol in the search string. The pattern number is specified after the '\' character in the replacement string. For example: search string: "{.*}" // changes double-quoted strings replace string: '\1' // to single-quoted strings search string: {[a-zA-Z]#} +{[a-zA-Z]#} replace string: \2 and \1 The example above reverses two adjacent alphabetic words and places the word 'and' between them. Specifying '\0' in the replacement string references the entire search pattern. For example: search string: ^.+$ // encloses non-blank lines replace string: (\0) // in parentheses search string: [a-zA-Z0-9]# // duplicates alphanumeric replace string: \0\0 // identifiers To enter the '\' character in a replacement string, enter it twice. For example: search string: ^ // insert '\\' at the beginning replace string: \\\\ // of each line Summary of Regular Expression Operators ─────────────────────────────────────── Operator Description ──────── ─────────── ^ match the beginning of a line $ match the end of a line . match any character [ ] specify a characters class [ - ] specify a range of characters [~ ] specify the complement of a character class ? optionally match the preceding pattern | the alternation ('or') operator * match zero or more of the preceding pattern (min closure) + match one or more of the preceding pattern (min closure) @ match zero or more of the preceding pattern (max closure) # match one or more of the preceding pattern (max closure) { } define a group or tag a pattern \ literal operator, or reference a tagged pattern \a match the alert or beep character (ASCII 7) \b match the backspace character (ASCII 8) \f match the formfeed character (ASCII 12) \n match the newline or linefeed character (ASCII 10) \r match the return character (ASCII 13) \t match the tab character (ASCII 9) \v match the vertical tab character (ASCII 11) \xHH match the hexadecimal character 'HH'